home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
HPAVC
/
HPAVC CD-ROM.iso
/
ASMVEG.ZIP
/
ASMVEG.TXT
next >
Wrap
Text File
|
1996-12-23
|
15KB
|
337 lines
Assembly Language for Veggies (And C programmers) Part 1.
So you wanna be an Assembly Language programmer? OK, no problem! this DOC is
designed to introduce you to the basics of ASM and the concepts behind same. I
will be providing examples and some demo routines along the way, along with
cross refences and examples from other languages to clarify certain points.
OK, so here goes...
When you program in assembly language, you have complete and utter control of
the computer, and everything it does. YOU get to choose EXACTLY it's behavior
under your program. you can directly access any hardware and do anything - the
only limit is your skill.
WHAT YOU GET
Basicly, assembly programs talk directly to the 8088, 8086, 80188, 80186,
80286, 80386, or 80486 IC inside your Machine. This is a custom chip designed
by Intel and is called the CPU (Central Processing Unit). We begin by looking
into these chips. your machine, depending upon model, will use one of these
chips. XT's have either 8088's, 8086's or their NEC clones, the V20's and
V30's. (The NEC Chips are 100% compatible), whilst the AT's use the 8018x
series (Ratrely, but they are used!) or 80286 chip. The newer fast machines
use 80386 or 80486 chips and hense their name.
All the chips are "Upward Compatible" - that means that anything the 8088
could do, all the chips can do too, except faster. The 186 and 286 added more
instructions - the 386 & 486 can do those as well.. so you see that the 486
is king of the mountain, but will do the exact same job of an 8088 (only about
30 times faster!) if required.
Because of this Upward compatibility, you see that we can write a program that
works on an 8088 and expect it to execute correctly on any IBM design,
regardless of CPU UNLESS we use instructions specificly for one of the later
chips (Which is nearly never).
So, to program these chips, one requires an understanding of them.... Here
goes. The chip has the ability to execute machine code instructions. This is
the most important job of the chip. It reads an INSTRUCTION from computer
memory, figures out what the instruction means, and executes it, then gets the
next instruction. That is ALL that a CPU is capable of doing!!!! As long as a
computer is operating, it is doing this...from the first second you switch it
on, until you switch it off again....
Even when a machine has "Crashed" it can still be doing something - and
usually is - but what it is doing is useless and won't allow the operator a
chance to send it instructions to tell it to stop it's useless activity. The
only way to stop a CPU from doing it's job is to HOLD the RESET button on the
computer down, or to switch power off.
Thus you see, you must have a logical set of instructions with a correct start
point, and a correct end point. The CPU keeps track of what it is doing with a
set of REGISTERS. the registers are of utmost importance to the programmer,
for without them he would be lost.
Here are the registers of the 8088 series (common to all models):
AX, BX, CX, DX SP, BP, SI, DI CS, DS, ES, SS IP, F.
The letters are the standard referance as used by common agreement. all
registers are 16 bits wide - that is they can hold a number from 0 to FFFF
hex. They are grouped according to use :
IP - Instruction Pointer, is used internally by the CPU to keep track of what
instruction it should execute NEXT....IE a marker of where in memory it is up
to.
F - Flags, also internal to the CPU, is a set of 1 bit markers that can be
either 0 or 1 to indicate a certain CPU status. The Flags have a set of
instructions designed to read individual status Bits built into the CPU.
CS - code segment, the memory segment of the executing program. (more on
segments to come in a tic..) - this will be set upon startup of your program
and is usually NEVER touched.
DS - data segment, the default segment for which to get data from - used by
some instructions for transferring data about in memory.
ES - Same as DS, but toally user definable.
SS - Stack segment - Like DS, but only for stack operations. not normally
touched by user.. see section on stack.
SP - Stack pointer - a bit akin to IP, but for stack operations.
BP - Base pointer - general 16 bit register for user useage.
SI - source index - used by some instructions for data transfer. for user
useage.
DI - Destination Index - same as SI.
AX - Accumulator. 16 bit general register for user useage. all math conducted
inside this register.
BX - Base - general register for user useage - also used in some operations.
CX - count - general register for user useage - also used in some block
movement operations as a loop counter.
DX - Data - general register for user useage - also used in memory referance
and 32 bit math operations.
To keep things flexible, AX, BX, CX and DX can be divided into 2 8 bit
registers... Note: These are not extra, separate registers, simply a way of
accessing the same register 8 bits at a time!! The 8 bit versions are called
AH and AL , BH and BL etc... not too obviously, AH is the top 8 bits of AX,
whilst AL is the bottom 8 bits...
Thus a program that stores 67ac into AX could just as easily store 67 intoAH
and ac into AL - it would result in the same thing - AX would now equal 67ac.
One important concept to be grasped is that the registers are just like
pidgeon holes.... they just hold a number. That number can be an address, the
ASCII code for a letter, the result of a math instruction or whatever. The CPU
only knows it's got a number... thus, there's no such thing as:
Var
cx : word;
al : char;
or similar... It overcomes a big hassle in many languages... in PASCAL one
can't take a number variable and drop it into the middle of a string, one must
use the STR( function... not in ASM... one just umm.... uses it! thus there
are no "conversion" functions built in, or needed.... makes things a LOT
simpler at times!
As you ave gathered, the 8088 series are 16 bit CPU's - called this because
all the registers are 16 bit, and the data paths inside the cips are 16 bit
also! (Funny 'bout that)... BUT they were designed to use up to 1 MB of
memory. (Take my word for it) .... The problem is that 1 Meg requires 20 bits
to count up all the combinations... how does one count to 20 bits with 16 bit
registers? Impossible! - YES!! .... so the designers thought that instead
of inventing a 20 bit CPU they'd design SEGMENTATION. This is one thing new
programmers come to hate! It's easy if you follow it carefully, but more often
than not people stuff it up. This is where the segment registers come into
play.
Memory is accessed using a combination of 2 16 bit registers... the segment
and the offset... Valid combinations include : CS:IP (for where to get the
next instruction from) SS:SP (stack location) DS:SI, ES:DI and more... Note
that a SEGEMENT register must come first (CS, DS, ES, SS) - you can't do AX:DI
- it just isn't allowed. This is a hardware restriction, but in practice it's
not a hassle.
Here's the math for working out which address you're at...
The segment registers point to the start of a 64k "Chunk" of RAM, whilst the
offset points to the byte within that chunk.
(All addresses in HEX notation)
you can have many combinations that relate to the same physical address...
Thus: 0000:0401 is the same as 0040:0001, f000:a000 is the same as fa00:0000
Addition is performed inside the CPU to work things out thus:
Segment register: 0000 0040
plus offset register: 0401 0001
-------------------------------------------------
equals: 00401 00401
--------------------------------------------------
note how the result is 5 hex digits long - that's 20 bits in binary. The
segment is moved one digit along as it's a 64k chunk it points to. (64k = 4
bits = 1 hex digit)
By the way, get used to hex, it's the generic way of referring to register
contents.. It's always a 4 DIGIT number for a 16 bit register, a 2 DIGIT
numver for 8 bit, or a 5 digit number for 20 bit. the conversion is thus:
| | | | |
Binary: 1 0 1 0 0 0 0 1 1 1 0 0 1 0 0 1
Take the nuber in groups of 4 bits. A hex digit is base 16 - there are 16
possibilities per digit (Decimal offers 10 [0-9]) hex has 0-9 and a-f [16
varietites]
you get 16 combinations in 4 bits - from 0 0 0 0 to 1 1 1 1 [0-f]
so the number above is: a1c9
Remember that each bit has a "weight" thus:
8 4 2 1 - weight
0 1 1 0 - hex number
to convert quickly, take a group of 4 bits, mentally ad the weights of all "1"
bits - in this example 4+2 and the result is 6. The hex for this binary is 6.
note that in hex addition, 9+1=a, not 10!!! SO:
1 1 1 1 = 8+4 (c) + 2 (e) +1 (f) = F hex.
That is all the CPU provides for you to use!!! (And all you need)... Here's
how...
THE BASIC IBM PC
We begin out examples by looking at a basic IBM PC equiped with <say> a
floppy, a hard drive, some RAM and a video card, running MS-DOS.
OK, when your program is started, it is given access to all available memory
from wherever dos has currently used up to the end of physical memory. This
could be as much as 600k or maybe even more under DOS 5.0, or as little as 30
or 60k in a very small multitasking window. Your program has permission to do
anything to this block of memory, and it's contents at load time are garbage.
Program begins execution at the first instruction in your program (CS:IP will
initially point here ) and wanders through, following the program to the end.
In the IBM PC the CS, IP, DS, ES and SS:SP are all preset for you to valid,
correct settings when your program is loaded. further, CS, DS, ES (and usually
SS) will all be equal.
Because of the 64k segmentation limitation, everyone seems to do things in 64k
chunks, and DOS is no exception. your program always begins at CS:0100 (The
first 256 hex bytes are filled with information to be used by the program if
needed) and the SS:SP is usuallyplaced at the very end of the segment (ie
SS=CS, IP=FFFE)
ABOUT THE STACK
The stack is vital to the operation of any program. It is for holding
temporary addresses during program execution, and can be used by the user or
the CPU at any time. Thus, a valid stack must always be maintained. Whenever
an instruction executes the equivilant of a BASIC GOSUB, the address of where
to go upon RETURN is saved on the stack. This must be a 16 bit digit (CS:IP)
thus the stack starts at FFFE and not FFFF. after a storage, the SP is
decreased by 2, so it then points to FFFC. don't ask why it grows downward, it
just does.... the lower the SP, the bigger the stack. Again, when the RETURN
is executed, the SP has 2 added to it, and again becomes FFFE.
more on the stack later.
now it's time to see what is available to our program when it's run.
IBM thought they'd give us a set of interface routines for using the hardware
they'd built in. Nice of them that, saves us from directly manipulating the
hardware which is usually a tricky and wierd task! These are called the BIOS
routines and are built into a chip on the computer's hardware. They are
responsible for starting the computer when powered u, and also loading the
operating system MS-DOS.
DOS also supplies a set of routines for working with DOS - these are called the
DOS routines (No shit!) and are available whenever DOS is in memory.
There's stuff like reading and writing to disk, screen, etc. getting emory
sizes etc and all sorts. See a good ASM book for details - there's hundreds of
them and take about 200 pages of text to fully cover - I'm not typing that lot
out again!!!
In fact, most of your programs will simply be loading up and calling these
routines... Here's a simple example (Type this into A86, it'll work !!)
; Demo program1
begin: jmp start
string db 'Hi there!!$'
start: mov dx,offset string
mov ah,09
int 021
int 020
now that will be very confusing to you, but it's a simple program in assembler
(Can you guess what it does?) Let's look at it line by line.
; Demo program1 --- any text after a ; is ignored - you're actually telling
the assembler here, not writing 8088 code. This could be left out without any
problem. It does not effect the size of the final code.
begin: jmp start Here's our first instruction. Begin and start are LABELS
used by the assembler to refer to an address... note how there's no hardware
addresses written in... I could have said simply JMP CS:010F but it's much
easier to use a label. That way if I added more between Begin: and start: i
would not have to recalculate the address. The assembler works out the address
at assembly time and substitutes it instead.
string db '.....$' String is another label. db means define byte. this is
how we reserve memory. everything between the quotes is stored into the
program and appears in memory at load time referenced by the string label.
start: mov dx,offset string this loads the DX register with the ADDRESS of the
string label. Note the word offset. This means you want the address of the
label, not what is at that address.
mov ah,09 - loads the AH register with 09 hex. This is needed by the next
instruction.
INT 021 - Call MS-DOS's built in routines They see that AH=09, decide that
you want a write string to screen routine and display the text starting at the
location in DX (Well, really DS:DX, but as I said, DS was setup for us before
the program began) until it sees a $ symbol. The routine is written to return
to our program when the $ symbol is encountered. the $ is not written to
the screen.
INT 020 - call another DOS routine. This one returns control to the calling
program (In most cases Command.com)
To fully understand all about what the hell is going on with all these INT's,
I strongly suggest you invest in one of these books:
The Peter Norton Programmer's Guide to the IBM PC - Peter Norton. Try to get
edition #2 but if you can only get a first ed copy or one'ds going cheap grab
it - they're pretty good (I still use an ed.1 copy!)
Advanced MS-DOS - Ray Duncan. Only buy 2nd ed. 1st ed. was fairly limited and
not really worth the money - it lacks any coverage above dos 3.0....
Nothing else is worth your money. I'll make the occasinal page cross referance
(esp. to the Norton book which I feel is the better of the two) from time to
time..
ALSO
Scab from your favourite leeching BBS a copy of A86 V3.21 or later, and D86 to
go with it... this is the assembler I'll be using in the future... I'll
consider demonstrating MASM if you really want me too, but I don't know a hell
of a lot about it and don't really want to learn... I only know enuf to know
what a basic program might need.
This brings lesson 1 pretty much to a close... get yourself one of these
books, delve into it, get A86, type in the demo, absorb as much as you can
then write me back with your questions and problems!
I'll be starting lesson 2 soon!.... Cya there.
.\\erlin